Predicting C. difficile infection severity from the taxonomic composition of the gut microbiome
Kelly L. Sovacool1, Sarah E. Tomkovich2, Megan L. Coden4, Vincent B. Young2,4, Krishna Rao4, Patrick D. Schloss2,5
1 Department of Computational Medicine & Bioinformatics, University of Michigan
2 Department of Microbiology & Immunology, University of Michigan
3 Department of Molecular, Cellular, and Developmental Biology, University of Michigan
4 Division of Infectious Diseases, Department of Internal Medicine, University of Michigan
5 Center for Computational Medicine and Bioinformatics, University of Michigan
Introduction
- C. difficile infection (CDI) can lead to adverse outcomes including recurrent
infections, colectomy, and death (1).
- The composition of the gut microbiome plays an important role in
determining colonization resistance and clearance when exposed to
C. difficile (2, 3).
- Identifying the specific microbiome features that distinguish severe CDI cases would eventually allow clinicians to tailor interventions based on a patient’s risk, ultimately leading to better health outcomes.
Dataset
- We have 16S amplicon sequence data from
1191 CDI patient stool samples, with
cases classified as severe or not severe according to three separate definitions:
- the Infectious Diseases
Society of America (IDSA) definition: white blood cell count ≥ 15 k/μL
and serum creatinine level ≥ 1.5 mg/dL (4).
- the CDC definition: ICU admission, colectomy, or death occurring within 30 days of CDI, and confirmed as attributable to CDI via clinical chart review.
- all-cause severity: ICU admission, colectomy, or death occurring within 30 days of CDI, regardless of the cause.
| no |
649 |
513 |
1059 |
| yes |
342 |
26 |
83 |
Methods
- Sequences were processed with mothur according to the MiSeq SOP and clustered
into de novo OTUs at a 3% distance threshold (5, 6).
- We then trained machine learning (ML) models with OTU abundances as features to
predict the IDSA severity, CDI-attributable severity, and all-cause severity of CDI cases using the mikropml R package (7, 8).
- The dataset was randomly split into training and testing sets with 80% of the
data in the training set, then random forest models were trained with 5-fold cross-validation
repeated 100 times, and performance as the area under the receiver-operator
curve (AUROC) was measured on the testing set for the best model.
- This was repeated for 100 different random partitions of the data into training and testing sets.

Results
- This process yielded median AUROC values of TODO
- Feature importance was determined with a permutation test for the best random forest model, revealing that the top 5 OTUs that contributed the most to model performance were TODO.


Conclusions
- The modest performance may be improved in future work by training to predict
clinically confirmed adverse patient outcomes rather than IDSA severity, such as
recurrence, admission to intensive care, colectomy, or death.
- Predicting a patient’s risk of experiencing a severe CDI and identifying the
specific microbiome features that distinguish severe CDI cases will allow
clinicians to tailor interventions based on each patient’s individual
microbiome, ultimately leading to better health outcomes.
Acknowledgements
This research was supported by National Institutes of Health grants U01AI124255
and the Michigan Institute for Clinical and Health Research Postdoctoral
Translational Scholars Program (UL1TR002240 from the National Center for
Advancing Translational Sciences).
References
1.
Kwon JH,
Olsen MA,
Dubberke ER. 2015. The morbidity, mortality, and costs associated with
Clostridium difficile infection. Infect Dis Clin North Am
29:123–134. doi:
10.1016/j.idc.2014.11.003.
2.
Kociolek LK,
Gerding DN. 2016. Breakthroughs in the treatment and prevention of
Clostridium difficile infection. Nat Rev Gastroenterol Hepatol
13:150–160. doi:
10.1038/nrgastro.2015.220.
3.
Guh AY,
Kutty PK. 2018. Clostridioides difficile
Infection. Ann Intern Med
169:ITC49–ITC64. doi:
10.7326/AITC201810020.
4.
Cohen SH,
Gerding DN,
Johnson S,
Kelly CP,
Loo VG,
McDonald LC,
Pepin J,
Wilcox MH,
Society for Healthcare Epidemiology of America,
Infectious Diseases Society of America. 2010. Clinical practice guidelines for
Clostridium difficile infection in adults: 2010 update by the society for healthcare epidemiology of
America (
SHEA) and the infectious diseases society of
America (
IDSA). Infect Control Hosp Epidemiol
31:431–455. doi:
10.1086/651706.
5.
Kozich JJ,
Westcott SL,
Baxter NT,
Highlander SK,
Schloss PD. 2013. Development of a
Dual-Index Sequencing Strategy and
Curation Pipeline for
Analyzing Amplicon Sequence Data on the
MiSeq Illumina Sequencing Platform. Appl Environ Microbiol
79:5112–5120. doi:
10.1128/AEM.01043-13.
6.
Schloss PD,
Westcott SL,
Ryabin T,
Hall JR,
Hartmann M,
Hollister EB,
Lesniewski RA,
Oakley BB,
Parks DH,
Robinson CJ,
Sahl JW,
Stres B,
Thallinger GG,
Van Horn DJ,
Weber CF. 2009. Introducing mothur:
Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Applied and Environmental Microbiology
75:7537–7541. doi:
10.1128/AEM.01541-09.
7.
Topçuoğlu BD,
Lesniak NA,
Ruffin MT,
Wiens J,
Schloss PD. 2020. A
Framework for
Effective Application of
Machine Learning to
Microbiome-Based Classification Problems. mBio
11. doi:
10.1128/mBio.00434-20.
8.
Topçuoğlu BD,
Lapp Z,
Sovacool KL,
Snitkin E,
Wiens J,
Schloss PD. 2021. Mikropml:
User-Friendly R Package for
Supervised Machine Learning Pipelines. JOSS
6:3073. doi:
10.21105/joss.03073.